[n8n] Fix metric mappings and add full v2 metric coverage by AAraKKe · Pull Request #23635 · DataDog/integrations-core

AAraKKe · 2026-05-08T10:53:57Z

What does this PR do?

Overhauls the n8n integration's metric mapping, test environment, fixtures, and public documentation so the check matches what n8n actually emits across both supported major versions.

Metric map and metadata. Fixes the incorrect workflow_executions_duration_seconds mapping, removes mappings that n8n does not emit, restores valid families that the check was silently dropping (queue.job.dequeued, nodejs.active.requests), and adds the missing families verified live against n8n 1.118.1 and n8n 2.19.5.
n8n 2.x metric coverage. Adds the 2.x-only families: workflow.execution.duration.seconds.*, the audit.workflow.* lifecycle counters, embed.login.*, token.exchange.*, process.pss.bytes, the opt-in workflow_statistics_* gauges (workflows.total, users.total, production.executions, ...), and the new VM-isolated expression engine family (expression.evaluation.duration.seconds.*, expression.code.cache.*, expression.pool.*). metadata.csv calls out the version and env flag requirements for each.
Dynamic event-bus counters. Maps the broader n8n event-bus surface: roughly 70 dynamic counters covering audit (user, credentials, package, variable, execution data), AI node, runner, worker lifecycle, and workflow cancellation events. The test compose cannot realistically exercise these families (auth flows, AI invocations, package installs, etc.), so the new counters are added to the integration's rare-event set and excluded from the live symmetric-inclusion metadata assertion. The unit fixture carries synthetic samples so the metric map stays validated bidirectionally there. Counter names that n8n declares but its own prom-client exporter silently rejects (anything containing a hyphen, e.g. external-secrets, token-exchange, role-mapping, cluster.*) are intentionally not mapped and are noted as such in metrics.py and the README.
Worker scrape coverage. Documents and exercises scraping the n8n worker process as a separate Datadog instance. The conftest brings up both main and worker, tagged n8n_process:main / n8n_process:worker, so the worker-only families (node.started, node.finished, queue.job.dequeued, runner.task.requested) are covered end-to-end.
Test matrix and environment. Runs the integration in queue mode against both n8n 1.118.1 and 2.19.5 via the hatch matrix. The compose file pulls n8nio/n8n:${N8N_VERSION} directly, allocates dynamic host ports to avoid CI/local collisions, and uses port 5680 for the worker since n8n 2.x reserves 5679 for the task-runner broker.
Setup flow. Rebuilds dd_environment around WaitFor and CheckEndpoints conditions: wait for n8n's /healthz, import workflows via the n8n CLI, activate them, restart so webhooks register, then confirm both /metrics endpoints are reachable. Workflow IDs are read from the bind-mounted JSON, traffic generation runs in the fixture body, and the workflow-started wait raises with the last seen samples so timeouts are self-debugging.
Lab harness for dashboards and monitors. Adds tests/lab/ with its own docker-compose.yaml, traffic generator, sample workflows, and config so developers can run a long-lived n8n stack against a real Datadog Agent (autodiscovery for stdout logs, bind-mounted volume for the event-bus log files) and iterate on dashboards / monitors with real data. Gated on N8N_IS_LAB=true; integration tests are unaffected.
Rare-event metrics. Keeps real but opportunistic metrics (auth failure counters, audit lifecycle counters that only fire on state transitions, the libuv nodejs.active.requests gauge gated on in-flight handles, the expression-engine family gated on N8N_EXPRESSION_ENGINE=vm + N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true) in the metric map and metadata, with synthetic samples in the unit fixtures, and excludes them from the live symmetric assertion via the public exclude= parameter of assert_metrics_using_metadata.
Readiness behavior. n8n.readiness.check is still submitted, but the OpenMetrics scrape is no longer gated on /healthz/readiness. Metrics keep flowing when readiness reports degraded so queue depth, process state, and other SRE-relevant signals are not lost during incidents. The OpenMetrics base check's own scrape service check still surfaces scrape failures.
Public documentation. README updates cover the version-specific metric availability, the env flags that gate each family (N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS, N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION, N8N_METRICS_INCLUDE_QUEUE_METRICS, N8N_EXPRESSION_ENGINE, N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED), the requirement to keep raw_metric_prefix in sync with a customized N8N_METRICS_PREFIX, the 5679-reserved-by-task-runner-broker caveat for queue mode, the relocated event-bus log file path under the n8n user folder, and the dynamic event-bus surface with an extra_metrics example for future n8n events.

Motivation

Issue #23633 reported that the integration exposed the wrong Datadog metric name for n8n workflow execution duration. Validating the check against live n8n containers showed a broader gap: some mapped metrics were invented or stale, several real metrics were missing, and the test environment did not exercise queue mode, worker metrics, or version-specific differences.

This PR makes the integration empirically grounded against both the older supported n8n line and the current 2.x line, and gives developers a lab harness to keep iterating on n8n monitoring artifacts with real telemetry.

Validation

ddev test -fs n8n
ddev validate config -s n8n
ddev validate models -s n8n
ddev validate metadata n8n
ddev validate readmes n8n
ddev --no-interactive test n8n
ddev env test --dev n8n py3.13-1
ddev env test --dev n8n py3.13-2

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

- Drop fabricated metric names that n8n never emitted; map only what is empirically present. - Add the n8n 2.x metric families: workflow.execution.duration histogram, audit.workflow.*, embed.login.*, token.exchange.*, process.pss.bytes, runner.task.requested, and the workflow_statistics gauges. - Add worker-only families (node.started, node.finished, queue.job.dequeued, runner.task.requested) by introducing a worker-scrape instance. - Stop gating the OpenMetrics scrape on /healthz/readiness; emit n8n.readiness.check unconditionally so metrics still flow when the readiness endpoint is unhealthy. - Replace the custom Dockerfile with a direct n8nio/n8n image reference and parameterise the version via hatch.toml so the test matrix can run against both 1.118.1 and 2.19.5. - Allocate free host ports via datadog_checks.dev.utils.find_free_ports and forward them through docker_run env_vars to avoid port collisions on re-runs.

dd-octo-sts · 2026-05-08T10:55:29Z

⚠️ The qa/skip-qa label has been added with shippable changes

The following files, which will be shipped with the agent, were modified in this PR and
the qa/skip-qa label has been added.

You can ignore this if you are sure the changes in this PR do not require QA. Otherwise,
consider removing the label.

List of modified files that will be shipped with the agent

n8n/changelog.d/23635.changed
n8n/datadog_checks/n8n/check.py
n8n/datadog_checks/n8n/data/conf.yaml.example
n8n/datadog_checks/n8n/metrics.py
n8n/hatch.toml

codecov · 2026-05-08T11:07:37Z

Codecov Report

❌ Patch coverage is 88.44221% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.02%. Comparing base (996b3d5) to head (a0e259b).
⚠️ Report is 22 commits behind head on master.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…REFIX

datadog-prod-us1-3 · 2026-05-08T12:37:01Z

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

❄️ 2 New flaky tests detected

test_all_metadata_metrics_emitted from test_integration.py (Fix with Cursor)
RetryError[<Future at 0x7f1260057d90 state=finished raised RuntimeError>]
test_readiness_check_metric from test_integration.py (Fix with Cursor)
RetryError[<Future at 0x7f1260057d90 state=finished raised RuntimeError>]

ℹ️ Info

No other issues found (see more)

🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 88.44%
• Overall Coverage: 90.17%

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: a0e259b | Docs | Datadog PR Page | Give us feedback!}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1be3b3dc6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T12:50:09Z

    'queue_job_completed': 'queue.job.completed',
-    'queue_job_delayed_total': 'queue.job.delayed.total',
    'queue_job_dequeued': 'queue.job.dequeued',
    'queue_job_enqueued': 'queue.job.enqueued',
    'queue_job_failed': 'queue.job.failed',


Add the stalled queue counter mapping

In n8n 2.x queue mode, a stalled job event is emitted as n8n.queue.job.stalled, which the Prometheus service exposes as n8n_queue_job_stalled_total; with this map limited to completed/dequeued/enqueued/failed, that counter is silently ignored even when message event bus metrics are enabled. Since this block adds queue-job coverage, include queue_job_stalled (and corresponding metadata) so stalled jobs are collected.

Useful? React with 👍 / 👎.

A long-running n8n simulation that layers on top of the integration test environment so a real Datadog Agent can ship metrics to a Datadog org for dashboard / monitor iteration. - tests/lab/workflows/: five lab-only workflow JSONs covering distinct shapes (fast, slow Wait node, always-fail Code, flaky 30%, four-step chain). - tests/lab/traffic_generator.py: click CLI (start/generate/stop) that runs ddev env start --base, copies + imports + activates the lab workflows, restarts n8n, and drives a configurable async traffic mix against the webhooks and REST API. - tests/lab/config.yaml: webhook + REST probabilities and tick / reload intervals; hot-reloaded while the generator runs. - tests/lab/.ddev.toml: pins the lab to an `n8nlab` ddev org. - tests/lab/run_lab.sh: bash entrypoint with an EXIT trap so Ctrl+C always runs lab:stop. - hatch.toml: new [envs.lab] env with click/httpx/pyyaml/rich and start/generate/stop scripts.

AAraKKe · 2026-05-11T15:03:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3db752d2ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T15:09:37Z

+Update the n8n metric coverage and test harness, verified live against n8n 1.118.1 and 2.19.5:
+
+- Add missing common event-driven metrics: ``audit.workflow.archived``, ``audit.workflow.created``, ``audit.workflow.deleted``, ``audit.workflow.unarchived``, ``audit.workflow.updated``, and ``queue.job.stalled``.


Keep the changelog fragment to one line

AGENTS.md's Changelog Management instructions say to "Write a single line describing the change," but this new fragment is a multi-line bulleted release note that continues through line 15. This violates the repo's changelog format and will produce an oversized release entry for a single change; please collapse it to one concise sentence ending with a period.

Useful? React with 👍 / 👎.

This is not mandatory and the format is correct to treat bullet points correctly. This is the draft

Unreleased / 2026-05-11

Added:

Update the n8n metric coverage and test harness, verified live against n8n 1.118.1 and 2.19.5:

Add missing common event-driven metrics: audit.workflow.archived, audit.workflow.created, audit.workflow.deleted, audit.workflow.unarchived, audit.workflow.updated, and queue.job.stalled.

Add n8n 2.x workflow duration metrics: workflow.execution.duration.seconds.*.

Add n8n 2.x audit workflow metrics: audit.workflow.activated, audit.workflow.deactivated, audit.workflow.executed, audit.workflow.resumed, audit.workflow.version.updated, and audit.workflow.waiting.

Add n8n 2.x embed login metrics: embed.login.requests and embed.login.failures.

Add n8n 2.x token exchange metrics: token.exchange.requests, token.exchange.failures, token.exchange.identity.linked, and token.exchange.jit.provisioning.

Add n8n 2.x process memory metric: process.pss.bytes.

Add n8n 2.x workflow statistics metrics: production.executions, production.root.executions, manual.executions, users.total, enabled.users, workflows.total, and credentials.total.

Restore valid metrics that the integration was previously dropping: queue.job.dequeued, nodejs.active.requests.

Add worker-only families node.started, node.finished, queue.job.dequeued, and runner.task.requested and document scraping the n8n worker process as a separate Datadog instance.

Remove the gating of OpenMetrics scraping on /healthz/readiness - n8n.readiness.check is still submitted, but metrics keep flowing when readiness reports degraded so SRE-relevant signals (queue depth, process state) are not lost during incidents.

Document version-specific metric availability and the n8n env flags that gate them (N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS, N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION, N8N_METRICS_INCLUDE_QUEUE_METRICS).

Use the actual /metrics URL in the openmetrics_endpoint example in conf.yaml.example/spec.yaml (was previously the host root, which silently mismatched the scrape path the check uses).

Document that raw_metric_prefix in conf.yaml must be kept in sync with a customised N8N_METRICS_PREFIX for the check to recognise the exposed metric names. (#23635)

Fixed:

Fix default raw metric prefix. (#23598)

n8n 2.x ships a new VM-isolated expression engine in @n8n/expression-runtime that registers its own Prometheus metrics under the n8n_expression_* prefix. The metrics are gated on N8N_EXPRESSION_ENGINE=vm and N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true, neither of which defaults to on, so live containers do not emit samples unless explicitly opted in. Map the family in datadog_checks/n8n/metrics.py, add metadata.csv rows with the version + flag requirements documented, add synthetic samples to the unit fixtures so check_symmetric_inclusion stays green, list the metrics in the V2_ONLY / RARE_EVENT sets used by the integration assertions, and call out the env vars in the README's version-specific block.

The lab previously shared tests/docker/docker-compose.yaml with the test env and drove workflow import through docker exec in traffic_generator.py. That coupled two consumers with different port expectations (the test env uses find_free_ports for parallel safety; the lab needs a fixed URL for docs, agent config, and the traffic generator) and put workflow lifecycle in two places. Add tests/lab/docker-compose.yaml that hardcodes 5678/5680 and bind-mounts both the test fixtures and the lab workflows under /workflows/. Gate the compose + port selection in tests/common.py on N8N_IS_LAB so the same conftest serves both modes. Move workflow import/activate into conftest (scanning the bind-mount, reading stable ids from JSON), and add a lab-only logs block + docker_volumes yield so the Datadog Agent picks up n8n stdout via autodiscovery and the event-bus log files via the data volume. Drop the docker-exec workflow import from traffic_generator.py now that conftest owns it. Update the README log-collection section to reflect that event-bus logs live under the n8n user folder rather than N8N_LOG_FILE_LOCATION.

- Tighten _generate_workflow_traffic success check to == 200 so a webhook that responds 4xx (e.g. not yet registered after restart) does not falsely count as a healthy workflow run; capture last_status / last_exc and surface them in the RuntimeError so CI failures point at the real cause. - Replace the bespoke time.monotonic() wait loops with WaitFor + raise predicates (the dominant pattern across integrations-core). Restructure dd_environment conditions so the docker_run condition chain runs: wait for /healthz, activate workflows, then assert /metrics reachable on main + worker. Workflow-started wait stays inline since _generate_workflow_traffic is not idempotent. - Drop drop_rare_event_metrics; pass the public exclude= parameter to assert_metrics_using_metadata so we don't reach into AggregatorStub internals. - Replace bare try/except RequestException: pass blocks with contextlib.suppress. - Parametrize the two unit-fixture metadata tests; add the missing pytestmark = pytest.mark.unit and a comment explaining why the unit assertion is version-pinned to major=2. - Re-word the lab traffic generator reload-failure messages so it's clear the lab keeps running with the previous config. - Add N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION to the README's version-specific block and to the changelog flag list; indent the changelog sub-bullets so towncrier nests them under the wrapping bullet.

Use the public exclude= parameter on assert_metrics_using_metadata, matching test_integration.py. test_e2e.py was missed in the earlier review-feedback commit.

- Remove stray scratch notes accidentally committed at the end of the file (numbered questions and a changelog-process note that didn't belong in the public README). - Sentence-case the 'Data collected' and 'Service checks' headings. - Replace hyphen-as-em-dash usage (' - ') by splitting into separate sentences. - Replace slash-as-and/or in lists and tag descriptions: 'enqueued/dequeued/completed/failed/stalled counters' -> spelled-out list; 'result:success/failure' -> 'result:success or result:failure'; 'stdout/stderr' -> 'stdout and stderr'.

In the previous refactor _generate_workflow_traffic and the _workflow_started_non_zero wait were moved into the body of the dd_environment context manager. That made them vulnerable to fixture re-invocation paths (e.g. session teardown or flaky-plugin retry) that fired the body code against torn-down containers, producing a setup error after the e2e test had already passed. Put both back into conditions=[...]. That keeps them inside docker_run's set_up() retry envelope (attempts=2 in CI), and they are no longer exposed to the post-yield teardown path. The post-restart /healthz wait moves back inside _activate_imported_workflows so the function stays self-contained as a condition. Restore the (instances, E2E_METADATA) tuple yield for non-lab mode so the e2e Agent container still gets the docker_volumes mount it expects.

- conftest.py: parse n8n_workflow_started_total samples as floats instead of string-matching ' 0', so '0.0' / '0e+0' counter values are not treated as non-zero and OpenMetrics '# HELP'/'# TYPE' comment lines that share the prefix are skipped. - common.py: collapse the get_all_metadata_metrics passthrough into get_metadata_metrics_for_version (update integration + e2e call sites) and document the intentional V2_ONLY / RARE_EVENT overlap so future contributors do not assume the duplication is accidental. - check.py: cache the readiness endpoint with functools.cached_property (it is derived from immutable config) and parameterise the dict return / argument types as dict[str, Any]. - traffic_generator.py: scope the asyncio.Event and current config to _run_traffic instead of holding them at module level, threading both through _config_reloader. Switches the SIGINT/SIGTERM hook to loop.add_signal_handler so a second 'generate' invocation in the same process starts from a clean state.

- conftest.py: move the worker CheckEndpoints to after _activate_imported_workflows so any cascade from the n8n main restart is caught before downstream conditions scrape the worker. - test_unit.py: import the requests module and reference requests.ConnectionError at the call site so the builtin ConnectionError name is not shadowed for the rest of the module. - traffic_generator.py: extract _make_output_table() so the table schema lives in one place and _print_row() only owns row data.

On n8n 2.x, /healthz comes back after `docker compose restart n8n` before n8n has finished re-registering the active workflows' webhook routes. The existing WaitFor(_n8n_healthy) inside _activate_imported_workflows was satisfied while /webhook/test still returned 404, so _generate_workflow_traffic raced the registration and failed with last_status=404. Add a second WaitFor poll on the integration-test webhook itself so the registration is observed before downstream conditions run. v1 happens to register fast enough that the gap is not observable there, but the extra check costs at most one poll on the happy path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f407d5ae4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T18:01:54Z

+    'workflow_execution_duration_seconds': 'workflow.execution.duration.seconds',  # n8n 2.x+
    'workflow_failed': 'workflow.failed',
    'workflow_started': 'workflow.started',
    'workflow_success': 'workflow.success',


Map canceled workflow events

In n8n 2.x deployments where a workflow execution is canceled and message event bus metrics are enabled, n8n emits the dynamic counter n8n_workflow_cancelled_total, but this workflow event block only maps failed/started/success. Because unmapped OpenMetrics counters are skipped, canceled executions remain invisible even though this change expands v2 workflow/event coverage; please add workflow_cancelled and its metadata entry.

Useful? React with 👍 / 👎.

Map the broader n8n event-bus surface (~45 dynamic counters) covering audit (user, credentials, package, variable, execution data), AI node, runner, and workflow cancellation events, plus execution throttling. Counter names rejected by n8n's own prom-client validation (hyphenated families such as external-secrets, token-exchange, role-mapping, and cluster) are intentionally not mapped and called out in metrics.py. The integration test environment cannot realistically exercise these families end to end, so each new metric is documented as best-effort in metadata.csv and added to RARE_EVENT_METRIC_NAMES. The unit fixture carries synthetic samples so the metric map stays validated. README covers the dynamic-counter scope and shows an extra_metrics example for users to add events from future n8n releases.

Review from hestonhoffman is dismissed. Related teams and files:

documentation
- n8n/README.md
- n8n/metadata.csv

Review from HadhemiDD is dismissed. Related teams and files:

agent-integrations
- n8n/README.md
- n8n/changelog.d/23635.added
- n8n/datadog_checks/n8n/metrics.py
- n8n/metadata.csv
- n8n/tests/common.py
- n8n/tests/fixtures/n8n.txt
- n8n/tests/fixtures/n8n_custom.txt

NouemanKHAL · 2026-05-12T11:46:28Z

+Overhaul the n8n metric coverage and test harness, verified live against n8n 1.118.1 and 2.19.5:
+
+  - Expand baseline metric coverage with the event-driven counters that were previously missing or mis-mapped (``audit.workflow.*``, ``queue.job.stalled``, ``queue.job.dequeued``, ``nodejs.active.requests``).
+  - Add the n8n 2.x metric families: workflow execution duration histogram, audit workflow lifecycle counters, embed and token exchange auth counters, workflow statistics (executions, users, workflows, credentials), VM-isolated expression engine metrics, and ``process.pss.bytes``. Each family is gated on the corresponding n8n env flag; see the README for the full matrix.
+  - Map the broader event-bus surface (around 70 dynamic counters) so audit, AI node, user, package, variable, runner, worker, and workflow cancellation events are picked up automatically when n8n emits them. The integration test environment cannot exercise these families end to end, so they are documented as rare-event metrics and excluded from the symmetric metadata assertion.
+  - Add worker-only families (``node.started``, ``node.finished``, ``queue.job.dequeued``, ``runner.task.requested``) and document scraping the n8n worker process as a separate Datadog instance.
+  - Stop gating OpenMetrics scraping on ``/healthz/readiness``: ``n8n.readiness.check`` is still submitted, but metrics keep flowing when readiness reports degraded so SRE-relevant signals (queue depth, process state) are not lost during incidents.
+  - Fix the ``openmetrics_endpoint`` example in ``conf.yaml.example`` / ``spec.yaml`` to use the actual ``/metrics`` URL (the host root would silently mismatch the scrape path) and document that ``raw_metric_prefix`` must be kept in sync with a customised ``N8N_METRICS_PREFIX``.


Suggestion: this feels a bit too long and technical, could we make it shorter and more customer-friendly?

I tried hahaha, this is the third version, but there are so many things that change in this version that shortening it more seems we are leaving too much information out. Let me give it another round.

NouemanKHAL · 2026-05-12T11:55:59Z

+            response = self.http.get(endpoint)
+        except RequestException as e:
+            self.log.warning("Could not reach n8n readiness endpoint %s: %s", endpoint, e)
+            self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])


Suggested change

self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])

response = getattr(e, "response", None)

status_code = getattr(response, "status_code", None) or "none"

self.gauge('readiness.check', 0, tags=tags + [f"status_code:{status_code}"])

nit: could be good to add the status_code when it's available (HTTP error).

it is, isn't it? Or am I misunderstanding your suggestion?

is_ready = response.status_code == 200 self.gauge( 'readiness.check', 1 if is_ready else 0, tags=tags + [f'status_code:{response.status_code}'], )

I'm suggesting we set it on the failure path, inside the:

except RequestException as e: self.log.warning("Could not reach n8n readiness endpoint %s: %s", endpoint, e) self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])

Aaah, ok. When there is a RequestException, we do not get a response because the connection was never stablished (docs). We can get that if we do raise_for_status which we do not do here. The only exceptions raised are ConnectionError and TimeoutError which do not carry the status code.

Any other error (non 2xx) goes through the other branch where we add the code in the tag.

I checked, just in case the Wrapper was doing the raise_for_status internally and it is but only if we have an auth_token and it needs refreshing. Whatever is the final result is still done without the raise_for_status call. Here.

The failure path here carries no response. I can apply your suggestion since it is non-breaking but I believe it is dead code and extra attribute lookup we could save in every run. Let me know if you still prefer to apply it.

Ah I see, I missed that!
In that case, we could change the is_ready to be valid for anything in the 2xx range?

Sounds good, updated now.

The readiness gauge now reports 1 for any HTTP 2xx response on /healthz/readiness, not only 200. Rename the changelog entry from .added to .changed so the next release is a major bump, reflecting the breadth of the integration overhaul.

Review from NouemanKHAL is dismissed. Related teams and files:

agent-integrations
- n8n/changelog.d/23635.changed
- n8n/datadog_checks/n8n/check.py
- n8n/tests/test_unit.py

github-actions · 2026-05-13T06:35:29Z

⚠️ Major version bump
The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

dd-octo-sts · 2026-05-13T06:37:15Z

Validation Report

All 20 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

NouemanKHAL

🥇

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 8, 2026

AAraKKe requested review from a team as code owners May 8, 2026 10:53

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 8, 2026

dd-octo-sts Bot added documentation integration/n8n team/agent-integrations team/documentation labels May 8, 2026

Add changelog for PR #23635

2991295

AAraKKe marked this pull request as draft May 8, 2026 11:07

AAraKKe added 2 commits May 8, 2026 14:27

Refine n8n metric coverage and e2e setup

12f3122

Document raw_metric_prefix requirement when customizing N8N_METRICS_P…

8523188

…REFIX

Reformat changelog so towncrier renders sub-bullets correctly

1be3b3d

AAraKKe marked this pull request as ready for review May 8, 2026 12:44

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

AAraKKe marked this pull request as draft May 8, 2026 12:59

Add missing n8n event metric mappings

43e7fc8

AAraKKe marked this pull request as ready for review May 11, 2026 15:00

Merge branch 'master' into aarakke/fix-n8n-metrics

3db752d

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

AAraKKe added 3 commits May 11, 2026 18:44

AAraKKe marked this pull request as draft May 11, 2026 16:51

AAraKKe added 6 commits May 11, 2026 18:52

Fix e2e test referencing removed drop_rare_event_metrics helper

8c3703a

Use the public exclude= parameter on assert_metrics_using_metadata, matching test_integration.py. test_e2e.py was missed in the earlier review-feedback commit.

AAraKKe marked this pull request as ready for review May 11, 2026 17:56

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

hestonhoffman previously approved these changes May 11, 2026

View reviewed changes

HadhemiDD previously approved these changes May 12, 2026

View reviewed changes

Drop technical hyphen-rejection paragraph from n8n README

2844832

NouemanKHAL requested changes May 12, 2026

View reviewed changes

AAraKKe added 3 commits May 12, 2026 14:38

Tighten n8n changelog to one-line themes

c5cae88

Tone down n8n changelog lead-in

a67c26b

Reframe n8n changelog from user perspective

cc56e02

hestonhoffman approved these changes May 12, 2026

View reviewed changes

NouemanKHAL previously approved these changes May 12, 2026

View reviewed changes

NouemanKHAL approved these changes May 13, 2026

View reviewed changes

		Update the n8n metric coverage and test harness, verified live against n8n 1.118.1 and 2.19.5:

		- Add missing common event-driven metrics: ``audit.workflow.archived``, ``audit.workflow.created``, ``audit.workflow.deleted``, ``audit.workflow.unarchived``, ``audit.workflow.updated``, and ``queue.job.stalled``.

-            self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])
+            response = getattr(e, "response", None)
+            status_code = getattr(response, "status_code", None) or "none"
+            self.gauge('readiness.check', 0, tags=tags +  [f"status_code:{status_code}"])

Conversation

AAraKKe commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Validation

Review checklist

Uh oh!

dd-octo-sts Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datadog-prod-us1-3 Bot commented May 8, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

AAraKKe commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Unreleased / 2026-05-11

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

dd-octo-sts Bot commented May 13, 2026

Validation Report

Uh oh!

NouemanKHAL left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AAraKKe commented May 8, 2026 •

edited

Loading

dd-octo-sts Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented May 8, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading